Parallelizing Word2Vec in Shared and Distributed Memory

نویسندگان

Shihao Ji

Nadathur Satish

Sheng Li

Pradeep Dubey

چکیده

Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures but are based on vector-vector operations that are memory-bandwidth intensive and do not efficiently use computational resources. In this work, we improve reuse of various data structures in the algorithm through the use of minibatching, hence allowing us to express the problem using matrix multiply operations. We also explore different techniques to parallelize word2vec computation across nodes in a compute cluster, and demonstrate good strong scalability up to 32 nodes. In combination, these techniques allow us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Parallelizing a Tribology Application

Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. Different applications can exhibit totally different performance gains depending on the parallelization method used. This paper compares OpenMP, MPI, and Strings( A distributed shared memory) for parallelizing a compli...

متن کامل

Communication Analysis for Shared and Distributed Memory Machines

Advances in programming languages and parallelizing compilers are making parallel computers easier to use by providing a high-level portable programming model that protects software investment. However, experience has shown that simply finding parallelism is not always sufficient for obtaining good performance from today’s multiprocessors, largely because the cost of interprocessor communicatio...

متن کامل

Parallelization of NAS Benchmarks for Shared Memory Multiprocessore

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of port...

متن کامل

Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

متن کامل

Automatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework

In this paper, we outline an approach for compiling for distributed-memory multiprocessors that is inherited from compiler technologies for shared-memory multiprocessors. We believe that this approach to compiling for distributed-memory machines is promising because it is a logical extension of the shared-memory parallel programming model, a model that is easier for programmers to work with, an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1604.04661 شماره

صفحات -

تاریخ انتشار 2016

Parallelizing Word2Vec in Shared and Distributed Memory

نویسندگان

چکیده

منابع مشابه

Experiments with Parallelizing a Tribology Application

Communication Analysis for Shared and Distributed Memory Machines

Parallelization of NAS Benchmarks for Shared Memory Multiprocessore

Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

Automatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework

عنوان ژورنال:

اشتراک گذاری